Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to apply for E2E OLS evaluation framework for AAP chatbot #47

Merged
merged 5 commits into from
Jan 31, 2025

Conversation

justjais
Copy link

Description

PR to apply for E2E OLS evaluation framework for AAP chatbot

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Create a virtual environment, and install the necessary package via make install-deps, for this user need to clone ansible-chatbot-service repo.
  • For running the E2E tests locally, user need to configure olsconfig.yaml and copy to parent directory, for AAP chatbot scenario sample olsconfig.yaml is under /scripts/evaluation/ folder.
  • Once configured, user can run the evaluation framework over AAP complete set of QnA as under aap_doc_qna.parquet under /scripts/evaluation/eval_data/ folder, or else run over sample QnA defined under scripts/evaluation/eval_data/aap-sample.parquet file.
  • To run the eval framework, run following cmd:
OPENAI_API_KEY=IGNORED python -m scripts.evaluation.driver --qna_pool_file /Users/sjaiswal/Sumit/wisdom/ansible-chatbot-service/scripts/evaluation/eval_data/aap-sample.parquet --eval_provider_model_id my_rhoai+granite3-8b --eval_metrics answer_relevancy answer_similarity_llm cos_score rougeL_precision --eval_modes ols_rag --judge_model granite3-8b --judge_provider my_rhoai --eval_query_ids qna1 qna2 qna3 qna4 qna5

@justjais justjais changed the title <WIP DNM>PR to apply for E2E OLS evaluation framework for AAP chatbot PR to apply for E2E OLS evaluation framework for AAP chatbot Jan 21, 2025
@justjais justjais marked this pull request as ready for review January 21, 2025 18:22
@TamiTakamiya
Copy link
Collaborator

@justjais I have rebased ansible-chatbot-service to the latest upstream (road-core/service). Would you rebase to the current main branch? Sorry for causing extra work.

Copy link
Collaborator

@TamiTakamiya TamiTakamiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justjais I think we want to update those evaluation-related files whenever OLS updates their files. For that purpose, would you add following changes?

  1. Copy all files under scripts/evaluation in the OLS repo even if they are not used. I think it is easier for us to have them when we import changes in the OLS repo.
  2. If any additional files are required, give file names that include aap-
  3. Create scripts/evaluation/README-aap.md to document what was changed/added from/to the OLS code.
  4. Also document how to run the tool for Ansible chatbot in the same scripts/evaluation/README-aap.md file.

scripts/evaluation/olsconfig.yaml Show resolved Hide resolved
scripts/evaluation/utils/relevancy_score.py Show resolved Hide resolved
scripts/evaluation/olsconfig.yaml Show resolved Hide resolved
scripts/evaluation/olsconfig.yaml Show resolved Hide resolved
@TamiTakamiya TamiTakamiya force-pushed the main branch 2 times, most recently from f26a7a1 to 8d092ac Compare January 30, 2025 17:18
@justjais justjais requested a review from TamiTakamiya January 31, 2025 09:48
@justjais justjais merged commit 265d1c6 into main Jan 31, 2025
24 checks passed
@justjais justjais deleted the aap_38439 branch January 31, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants